Overview
Brought to you by YData
Dataset statistics
| Number of variables | 18 |
|---|---|
| Number of observations | 1914116 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 262.9 MiB |
| Average record size in memory | 144.0 B |
Variable types
| Numeric | 14 |
|---|---|
| DateTime | 2 |
| Categorical | 2 |
IBNR is highly overall correlated with long | High correlation |
ID_Timestamp is highly overall correlated with arrival_normalized and 1 other fields | High correlation |
arrival_delay_m is highly overall correlated with prev_arrival_delay_m and 2 other fields | High correlation |
arrival_normalized is highly overall correlated with ID_Timestamp and 1 other fields | High correlation |
departure_normalized is highly overall correlated with ID_Timestamp and 1 other fields | High correlation |
info_label_encoded is highly overall correlated with transformed_info_message | High correlation |
long is highly overall correlated with IBNR | High correlation |
max_station_number is highly overall correlated with stop_number | High correlation |
prev_arrival_delay_m is highly overall correlated with arrival_delay_m and 2 other fields | High correlation |
prev_departure_delay_m is highly overall correlated with arrival_delay_m and 2 other fields | High correlation |
station_progress is highly overall correlated with stop_number | High correlation |
stop_number is highly overall correlated with max_station_number and 1 other fields | High correlation |
transformed_info_message is highly overall correlated with info_label_encoded | High correlation |
weighted_avg_prev_delay is highly overall correlated with arrival_delay_m and 2 other fields | High correlation |
IBNR has 65408 (3.4%) zeros | Zeros |
arrival_delay_m has 1358742 (71.0%) zeros | Zeros |
prev_arrival_delay_m has 1452766 (75.9%) zeros | Zeros |
prev_departure_delay_m has 1402573 (73.3%) zeros | Zeros |
weighted_avg_prev_delay has 1053037 (55.0%) zeros | Zeros |
Reproduction
| Analysis started | 2024-12-07 09:00:17.664082 |
|---|---|
| Analysis finished | 2024-12-07 09:02:40.579079 |
| Duration | 2 minutes and 22.91 seconds |
| Software version | ydata-profiling vv4.12.0 |
| Download configuration | config.json |
Variables
ID_Base
Real number (ℝ)
| Distinct | 36195 |
|---|---|
| Distinct (%) | 1.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -2.236894 × 1016 |
| Minimum | -9.223177 × 1018 |
|---|---|
| Maximum | 9.2208921 × 1018 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 961232 |
| Negative (%) | 50.2% |
| Memory size | 14.6 MiB |
Quantile statistics
| Minimum | -9.223177 × 1018 |
|---|---|
| 5-th percentile | -8.3353319 × 1018 |
| Q1 | -4.5890671 × 1018 |
| median | -4.5229877 × 1016 |
| Q3 | 4.5637501 × 1018 |
| 95-th percentile | 8.3281594 × 1018 |
| Maximum | 9.2208921 × 1018 |
| Range | -2.674985 × 1015 |
| Interquartile range (IQR) | 9.1528171 × 1018 |
Descriptive statistics
| Standard deviation | 5.3265411 × 1018 |
|---|---|
| Coefficient of variation (CV) | -238.1222 |
| Kurtosis | -1.1929362 |
| Mean | -2.236894 × 1016 |
| Median Absolute Deviation (MAD) | 4.5719204 × 1018 |
| Skewness | 0.0095596013 |
| Sum | -1.8524912 × 1018 |
| Variance | 2.8372041 × 1037 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| -2.094717035 × 1018 | 287 | < 0.1% |
| -2.325125148 × 1018 | 285 | < 0.1% |
| -2.850414882 × 1018 | 280 | < 0.1% |
| 8.887303232 × 1018 | 280 | < 0.1% |
| -7.17271808 × 1018 | 280 | < 0.1% |
| 8.939332948 × 1018 | 280 | < 0.1% |
| 3.654108814 × 1018 | 280 | < 0.1% |
| -4.024428317 × 1018 | 279 | < 0.1% |
| -9.995230551 × 1017 | 279 | < 0.1% |
| 1.305051853 × 1018 | 279 | < 0.1% |
| Other values (36185) | 1911307 |
| Value | Count | Frequency (%) |
| -9.223176951 × 1018 | 5 | < 0.1% |
| -9.222235769 × 1018 | 35 | < 0.1% |
| -9.221813993 × 1018 | 194 | |
| -9.221103336 × 1018 | 63 | < 0.1% |
| -9.220755073 × 1018 | 4 | < 0.1% |
| -9.220659516 × 1018 | 30 | < 0.1% |
| -9.220172063 × 1018 | 20 | < 0.1% |
| -9.219634608 × 1018 | 18 | < 0.1% |
| -9.218627477 × 1018 | 23 | < 0.1% |
| -9.218606938 × 1018 | 10 | < 0.1% |
| Value | Count | Frequency (%) |
| 9.220892138 × 1018 | 15 | < 0.1% |
| 9.22087069 × 1018 | 54 | < 0.1% |
| 9.219589171 × 1018 | 14 | < 0.1% |
| 9.218406789 × 1018 | 42 | < 0.1% |
| 9.218312429 × 1018 | 49 | < 0.1% |
| 9.217323563 × 1018 | 139 | |
| 9.217142214 × 1018 | 4 | < 0.1% |
| 9.216409101 × 1018 | 25 | < 0.1% |
| 9.216019188 × 1018 | 16 | < 0.1% |
| 9.215471812 × 1018 | 39 | < 0.1% |
ID_Timestamp
Real number (ℝ)
High correlation 
| Distinct | 10066 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.4071087 × 109 |
| Minimum | 2.4070319 × 109 |
|---|---|
| Maximum | 2.4071424 × 109 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.6 MiB |
Quantile statistics
| Minimum | 2.4070319 × 109 |
|---|---|
| 5-th percentile | 2.4070807 × 109 |
| Q1 | 2.4070914 × 109 |
| median | 2.4071109 × 109 |
| Q3 | 2.4071223 × 109 |
| 95-th percentile | 2.4071415 × 109 |
| Maximum | 2.4071424 × 109 |
| Range | 110496 |
| Interquartile range (IQR) | 30902 |
Descriptive statistics
| Standard deviation | 21511.693 |
|---|---|
| Coefficient of variation (CV) | 8.9367352 × 10-6 |
| Kurtosis | -0.039654706 |
| Mean | 2.4071087 × 109 |
| Median Absolute Deviation (MAD) | 19400 |
| Skewness | -0.34028446 |
| Sum | 4.6074852 × 1015 |
| Variance | 4.6275293 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2407091633 | 679 | < 0.1% |
| 2407081633 | 676 | < 0.1% |
| 2407111633 | 665 | < 0.1% |
| 2407101633 | 658 | < 0.1% |
| 2407090733 | 647 | < 0.1% |
| 2407080733 | 645 | < 0.1% |
| 2407100733 | 642 | < 0.1% |
| 2407091533 | 642 | < 0.1% |
| 2407121633 | 640 | < 0.1% |
| 2407091733 | 640 | < 0.1% |
| Other values (10056) | 1907582 |
| Value | Count | Frequency (%) |
| 2407031857 | 3 | < 0.1% |
| 2407040236 | 4 | < 0.1% |
| 2407040245 | 2 | < 0.1% |
| 2407040253 | 2 | < 0.1% |
| 2407040302 | 4 | < 0.1% |
| 2407040312 | 4 | < 0.1% |
| 2407040313 | 21 | |
| 2407040314 | 1 | < 0.1% |
| 2407040317 | 45 | |
| 2407040319 | 9 | < 0.1% |
| Value | Count | Frequency (%) |
| 2407142353 | 5 | < 0.1% |
| 2407142352 | 3 | < 0.1% |
| 2407142351 | 19 | |
| 2407142350 | 5 | < 0.1% |
| 2407142349 | 1 | < 0.1% |
| 2407142348 | 19 | |
| 2407142347 | 2 | < 0.1% |
| 2407142346 | 11 | |
| 2407142345 | 7 | < 0.1% |
| 2407142344 | 8 |
stop_number
Real number (ℝ)
High correlation 
| Distinct | 54 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.020377 |
| Minimum | 1 |
|---|---|
| Maximum | 54 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.6 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 5 |
| median | 9 |
| Q3 | 16 |
| 95-th percentile | 25 |
| Maximum | 54 |
| Range | 53 |
| Interquartile range (IQR) | 11 |
Descriptive statistics
| Standard deviation | 7.3168309 |
|---|---|
| Coefficient of variation (CV) | 0.66393656 |
| Kurtosis | 0.16188212 |
| Mean | 11.020377 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 0.84101858 |
| Sum | 21094279 |
| Variance | 53.536014 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 137299 | 7.2% |
| 3 | 131099 | 6.8% |
| 4 | 127699 | 6.7% |
| 5 | 123202 | 6.4% |
| 6 | 119199 | 6.2% |
| 7 | 112947 | 5.9% |
| 8 | 104947 | 5.5% |
| 9 | 98293 | 5.1% |
| 10 | 94049 | 4.9% |
| 11 | 85393 | 4.5% |
| Other values (44) | 779989 |
| Value | Count | Frequency (%) |
| 1 | 14079 | 0.7% |
| 2 | 137299 | |
| 3 | 131099 | |
| 4 | 127699 | |
| 5 | 123202 | |
| 6 | 119199 | |
| 7 | 112947 | |
| 8 | 104947 | |
| 9 | 98293 | |
| 10 | 94049 |
| Value | Count | Frequency (%) |
| 54 | 7 | < 0.1% |
| 53 | 7 | < 0.1% |
| 52 | 6 | < 0.1% |
| 51 | 36 | |
| 50 | 33 | |
| 49 | 48 | |
| 48 | 58 | |
| 47 | 60 | |
| 46 | 67 | |
| 45 | 68 |
IBNR
Real number (ℝ)
High correlation  Zeros 
| Distinct | 4011 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7748324.3 |
| Minimum | 0 |
|---|---|
| Maximum | 8099506 |
| Zeros | 65408 |
| Zeros (%) | 3.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 8000075 |
| Q1 | 8001897 |
| median | 8004230 |
| Q3 | 8011723 |
| 95-th percentile | 8089101 |
| Maximum | 8099506 |
| Range | 8099506 |
| Interquartile range (IQR) | 9826 |
Descriptive statistics
| Standard deviation | 1457840.3 |
|---|---|
| Coefficient of variation (CV) | 0.18814911 |
| Kurtosis | 24.269478 |
| Mean | 7748324.3 |
| Median Absolute Deviation (MAD) | 3034 |
| Skewness | -5.1237119 |
| Sum | 1.4831192 × 1013 |
| Variance | 2.1252984 × 1012 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 65408 | 3.4% |
| 8089028 | 8461 | 0.4% |
| 8004128 | 8081 | 0.4% |
| 8098549 | 7416 | 0.4% |
| 8004135 | 7384 | 0.4% |
| 8004129 | 7382 | 0.4% |
| 8004131 | 7374 | 0.4% |
| 8004132 | 7355 | 0.4% |
| 8089047 | 7194 | 0.4% |
| 8004136 | 6958 | 0.4% |
| Other values (4001) | 1781103 |
| Value | Count | Frequency (%) |
| 0 | 65408 | |
| 8000001 | 481 | < 0.1% |
| 8000002 | 1 | < 0.1% |
| 8000004 | 334 | < 0.1% |
| 8000007 | 300 | < 0.1% |
| 8000009 | 443 | < 0.1% |
| 8000010 | 363 | < 0.1% |
| 8000011 | 568 | < 0.1% |
| 8000012 | 419 | < 0.1% |
| 8000013 | 957 | < 0.1% |
| Value | Count | Frequency (%) |
| 8099506 | 194 | < 0.1% |
| 8098553 | 4407 | |
| 8098549 | 7416 | |
| 8098348 | 4 | < 0.1% |
| 8098263 | 6232 | |
| 8098205 | 1930 | 0.1% |
| 8098193 | 461 | < 0.1% |
| 8098147 | 2459 | 0.1% |
| 8098105 | 4854 | |
| 8098096 | 4426 |
long
Real number (ℝ)
High correlation 
| Distinct | 3113 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.213232 |
| Minimum | 0.834032 |
|---|---|
| Maximum | 14.982644 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.6 MiB |
Quantile statistics
| Minimum | 0.834032 |
|---|---|
| 5-th percentile | 6.838607 |
| Q1 | 8.454498 |
| median | 9.957668 |
| Q3 | 12.29175 |
| 95-th percentile | 13.523916 |
| Maximum | 14.982644 |
| Range | 14.148612 |
| Interquartile range (IQR) | 3.837252 |
Descriptive statistics
| Standard deviation | 2.3062346 |
|---|---|
| Coefficient of variation (CV) | 0.2258085 |
| Kurtosis | -1.2012031 |
| Mean | 10.213232 |
| Median Absolute Deviation (MAD) | 1.734955 |
| Skewness | 0.090998934 |
| Sum | 19549311 |
| Variance | 5.3187181 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 11.536537 | 8022 | 0.4% |
| 11.575386 | 7373 | 0.4% |
| 11.583234 | 7368 | 0.4% |
| 11.548572 | 7363 | 0.4% |
| 11.565619 | 7329 | 0.4% |
| 13.283966 | 7075 | 0.4% |
| 11.593049 | 6923 | 0.4% |
| 11.519245 | 6512 | 0.3% |
| 11.503669 | 6498 | 0.3% |
| 11.604971 | 6132 | 0.3% |
| Other values (3103) | 1843521 |
| Value | Count | Frequency (%) |
| 0.834032 | 259 | < 0.1% |
| 0.896632 | 260 | < 0.1% |
| 6.070715 | 1427 | |
| 6.07384 | 894 | |
| 6.074485 | 1049 | |
| 6.08378 | 262 | < 0.1% |
| 6.091499 | 441 | < 0.1% |
| 6.094486 | 1279 | |
| 6.097265 | 807 | |
| 6.098877 | 286 | < 0.1% |
| Value | Count | Frequency (%) |
| 14.982644 | 271 | |
| 14.97908 | 189 | |
| 14.930408 | 259 | |
| 14.902088 | 248 | |
| 14.889318 | 278 | |
| 14.825531 | 304 | |
| 14.825234 | 267 | |
| 14.805774 | 41 | < 0.1% |
| 14.706775 | 259 | |
| 14.703529 | 276 |
lat
Real number (ℝ)
| Distinct | 3118 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 50.936138 |
| Minimum | 47.417954 |
|---|---|
| Maximum | 55.021381 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.6 MiB |
Quantile statistics
| Minimum | 47.417954 |
|---|---|
| 5-th percentile | 48.105303 |
| Q1 | 49.379389 |
| median | 51.098294 |
| Q3 | 52.496254 |
| 95-th percentile | 53.59793 |
| Maximum | 55.021381 |
| Range | 7.6034266 |
| Interquartile range (IQR) | 3.116865 |
Descriptive statistics
| Standard deviation | 1.8569028 |
|---|---|
| Coefficient of variation (CV) | 0.036455509 |
| Kurtosis | -1.0465715 |
| Mean | 50.936138 |
| Median Absolute Deviation (MAD) | 1.418212 |
| Skewness | -0.071097754 |
| Sum | 97497676 |
| Variance | 3.4480882 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 48.142623 | 8022 | 0.4% |
| 48.137048 | 7373 | 0.4% |
| 48.134202 | 7368 | 0.4% |
| 48.141969 | 7363 | 0.4% |
| 48.139452 | 7329 | 0.4% |
| 52.500737 | 7075 | 0.4% |
| 48.129168 | 6923 | 0.4% |
| 48.14354 | 6512 | 0.3% |
| 48.144371 | 6498 | 0.3% |
| 48.12744 | 6132 | 0.3% |
| Other values (3108) | 1843521 |
| Value | Count | Frequency (%) |
| 47.4179544 | 259 | |
| 47.456591 | 213 | < 0.1% |
| 47.5058367 | 567 | |
| 47.513241 | 428 | |
| 47.5251713 | 290 | |
| 47.543785 | 260 | |
| 47.544341 | 49 | < 0.1% |
| 47.547219 | 258 | |
| 47.54792 | 285 | |
| 47.549143 | 264 |
| Value | Count | Frequency (%) |
| 55.021381 | 286 | |
| 55.019862 | 281 | |
| 55.017947 | 264 | |
| 55.01765 | 273 | |
| 55.0149 | 249 | |
| 55.012455 | 290 | |
| 55.010432 | 309 | |
| 55.008077 | 259 | |
| 55.001937 | 260 | |
| 54.988543 | 303 |
arrival_plan
Date
| Distinct | 10087 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.6 MiB |
| Minimum | 2024-07-07 23:32:00 |
|---|---|
| Maximum | 2024-07-14 23:58:00 |
departure_plan
Date
| Distinct | 10091 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.6 MiB |
| Minimum | 2024-07-07 23:32:00 |
|---|---|
| Maximum | 2024-07-14 23:58:00 |
arrival_delay_m
Real number (ℝ)
High correlation  Zeros 
| Distinct | 110 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.0484067 |
| Minimum | 0 |
|---|---|
| Maximum | 159 |
| Zeros | 1358742 |
| Zeros (%) | 71.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 5 |
| Maximum | 159 |
| Range | 159 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 3.1801873 |
|---|---|
| Coefficient of variation (CV) | 3.0333528 |
| Kurtosis | 116.05381 |
| Mean | 1.0484067 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 8.0092521 |
| Sum | 2006772 |
| Variance | 10.113591 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1358742 | |
| 1 | 218086 | 11.4% |
| 2 | 112971 | 5.9% |
| 3 | 69359 | 3.6% |
| 4 | 38702 | 2.0% |
| 5 | 26169 | 1.4% |
| 6 | 18020 | 0.9% |
| 7 | 12762 | 0.7% |
| 8 | 10065 | 0.5% |
| 9 | 8115 | 0.4% |
| Other values (100) | 41125 | 2.1% |
| Value | Count | Frequency (%) |
| 0 | 1358742 | |
| 1 | 218086 | 11.4% |
| 2 | 112971 | 5.9% |
| 3 | 69359 | 3.6% |
| 4 | 38702 | 2.0% |
| 5 | 26169 | 1.4% |
| 6 | 18020 | 0.9% |
| 7 | 12762 | 0.7% |
| 8 | 10065 | 0.5% |
| 9 | 8115 | 0.4% |
| Value | Count | Frequency (%) |
| 159 | 1 | < 0.1% |
| 157 | 1 | < 0.1% |
| 140 | 1 | < 0.1% |
| 136 | 1 | < 0.1% |
| 134 | 1 | < 0.1% |
| 133 | 2 | < 0.1% |
| 120 | 1 | < 0.1% |
| 117 | 1 | < 0.1% |
| 116 | 1 | < 0.1% |
| 110 | 7 |
transformed_info_message
Categorical
High correlation 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.6 MiB |
| No message | |
|---|---|
| Information | |
| Bauarbeiten | |
| Störung | 121627 |
| Großstörung | 6422 |
Length
| Max length | 11 |
|---|---|
| Median length | 10 |
| Mean length | 10.025071 |
| Min length | 7 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | No message |
|---|---|
| 2nd row | No message |
| 3rd row | No message |
| 4th row | No message |
| 5th row | No message |
Common Values
| Value | Count | Frequency (%) |
| No message | 1379619 | |
| Information | 266395 | 13.9% |
| Bauarbeiten | 140053 | 7.3% |
| Störung | 121627 | 6.4% |
| Großstörung | 6422 | 0.3% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| no | 1379619 | |
| message | 1379619 | |
| information | 266395 | 8.1% |
| bauarbeiten | 140053 | 4.3% |
| störung | 121627 | 3.7% |
| großstörung | 6422 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 3039344 | |
| s | 2765660 | |
| a | 1926120 | |
| o | 1918831 | |
| m | 1646014 | |
| g | 1507668 | |
| N | 1379619 | |
| 1379619 | ||
| n | 800892 | 4.2% |
| r | 540919 | 2.8% |
| Other values (11) | 2284463 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 19189149 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 3039344 | |
| s | 2765660 | |
| a | 1926120 | |
| o | 1918831 | |
| m | 1646014 | |
| g | 1507668 | |
| N | 1379619 | |
| 1379619 | ||
| n | 800892 | 4.2% |
| r | 540919 | 2.8% |
| Other values (11) | 2284463 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 19189149 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 3039344 | |
| s | 2765660 | |
| a | 1926120 | |
| o | 1918831 | |
| m | 1646014 | |
| g | 1507668 | |
| N | 1379619 | |
| 1379619 | ||
| n | 800892 | 4.2% |
| r | 540919 | 2.8% |
| Other values (11) | 2284463 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 19189149 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 3039344 | |
| s | 2765660 | |
| a | 1926120 | |
| o | 1918831 | |
| m | 1646014 | |
| g | 1507668 | |
| N | 1379619 | |
| 1379619 | ||
| n | 800892 | 4.2% |
| r | 540919 | 2.8% |
| Other values (11) | 2284463 |
prev_arrival_delay_m
Real number (ℝ)
High correlation  Zeros 
| Distinct | 100 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.85598365 |
| Minimum | 0 |
|---|---|
| Maximum | 159 |
| Zeros | 1452766 |
| Zeros (%) | 75.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 4 |
| Maximum | 159 |
| Range | 159 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2.8498008 |
|---|---|
| Coefficient of variation (CV) | 3.3292701 |
| Kurtosis | 135.76144 |
| Mean | 0.85598365 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 8.679339 |
| Sum | 1638452 |
| Variance | 8.1213646 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1452766 | |
| 1 | 179718 | 9.4% |
| 2 | 95668 | 5.0% |
| 3 | 59382 | 3.1% |
| 4 | 32437 | 1.7% |
| 5 | 21623 | 1.1% |
| 6 | 14790 | 0.8% |
| 7 | 10335 | 0.5% |
| 8 | 8164 | 0.4% |
| 9 | 6581 | 0.3% |
| Other values (90) | 32652 | 1.7% |
| Value | Count | Frequency (%) |
| 0 | 1452766 | |
| 1 | 179718 | 9.4% |
| 2 | 95668 | 5.0% |
| 3 | 59382 | 3.1% |
| 4 | 32437 | 1.7% |
| 5 | 21623 | 1.1% |
| 6 | 14790 | 0.8% |
| 7 | 10335 | 0.5% |
| 8 | 8164 | 0.4% |
| 9 | 6581 | 0.3% |
| Value | Count | Frequency (%) |
| 159 | 1 | < 0.1% |
| 140 | 1 | < 0.1% |
| 136 | 1 | < 0.1% |
| 134 | 1 | < 0.1% |
| 133 | 1 | < 0.1% |
| 120 | 1 | < 0.1% |
| 110 | 6 | |
| 109 | 2 | < 0.1% |
| 107 | 1 | < 0.1% |
| 106 | 2 | < 0.1% |
prev_departure_delay_m
Real number (ℝ)
High correlation  Zeros 
| Distinct | 100 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.89956408 |
| Minimum | 0 |
|---|---|
| Maximum | 159 |
| Zeros | 1402573 |
| Zeros (%) | 73.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 4 |
| Maximum | 159 |
| Range | 159 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 2.8792037 |
|---|---|
| Coefficient of variation (CV) | 3.2006655 |
| Kurtosis | 133.35796 |
| Mean | 0.89956408 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 8.5940835 |
| Sum | 1721870 |
| Variance | 8.2898141 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1402573 | |
| 1 | 214085 | 11.2% |
| 2 | 109271 | 5.7% |
| 3 | 59559 | 3.1% |
| 4 | 33058 | 1.7% |
| 5 | 21813 | 1.1% |
| 6 | 14820 | 0.8% |
| 7 | 10493 | 0.5% |
| 8 | 8284 | 0.4% |
| 9 | 6601 | 0.3% |
| Other values (90) | 33559 | 1.8% |
| Value | Count | Frequency (%) |
| 0 | 1402573 | |
| 1 | 214085 | 11.2% |
| 2 | 109271 | 5.7% |
| 3 | 59559 | 3.1% |
| 4 | 33058 | 1.7% |
| 5 | 21813 | 1.1% |
| 6 | 14820 | 0.8% |
| 7 | 10493 | 0.5% |
| 8 | 8284 | 0.4% |
| 9 | 6601 | 0.3% |
| Value | Count | Frequency (%) |
| 159 | 1 | < 0.1% |
| 137 | 1 | < 0.1% |
| 135 | 1 | < 0.1% |
| 134 | 2 | < 0.1% |
| 132 | 1 | < 0.1% |
| 120 | 1 | < 0.1% |
| 110 | 6 | |
| 109 | 1 | < 0.1% |
| 108 | 1 | < 0.1% |
| 106 | 2 | < 0.1% |
weighted_avg_prev_delay
Real number (ℝ)
High correlation  Zeros 
| Distinct | 42050 |
|---|---|
| Distinct (%) | 2.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.64702311 |
| Minimum | 0 |
|---|---|
| Maximum | 114.66667 |
| Zeros | 1053037 |
| Zeros (%) | 55.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.5 |
| 95-th percentile | 3 |
| Maximum | 114.66667 |
| Range | 114.66667 |
| Interquartile range (IQR) | 0.5 |
Descriptive statistics
| Standard deviation | 1.9529416 |
|---|---|
| Coefficient of variation (CV) | 3.0183491 |
| Kurtosis | 155.53854 |
| Mean | 0.64702311 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 9.2553331 |
| Sum | 1238477.3 |
| Variance | 3.8139808 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1053037 | |
| 0.3333333333 | 13391 | 0.7% |
| 0.6666666667 | 11384 | 0.6% |
| 0.2 | 8948 | 0.5% |
| 0.5 | 8534 | 0.4% |
| 0.4 | 7818 | 0.4% |
| 0.2857142857 | 6704 | 0.4% |
| 1 | 5836 | 0.3% |
| 0.25 | 5704 | 0.3% |
| 0.1428571429 | 5366 | 0.3% |
| Other values (42040) | 787394 |
| Value | Count | Frequency (%) |
| 0 | 1053037 | |
| 0.002898550725 | 1 | < 0.1% |
| 0.00303030303 | 1 | < 0.1% |
| 0.003361344538 | 4 | < 0.1% |
| 0.003565062389 | 8 | < 0.1% |
| 0.003787878788 | 22 | < 0.1% |
| 0.004032258065 | 19 | < 0.1% |
| 0.004301075269 | 35 | < 0.1% |
| 0.004597701149 | 30 | < 0.1% |
| 0.004926108374 | 38 | < 0.1% |
| Value | Count | Frequency (%) |
| 114.6666667 | 1 | |
| 110.0714286 | 1 | |
| 93.76190476 | 1 | |
| 93.33333333 | 1 | |
| 80 | 1 | |
| 78.06666667 | 1 | |
| 77.19047619 | 1 | |
| 74 | 1 | |
| 72.52747253 | 1 | |
| 72.22222222 | 1 |
max_station_number
Real number (ℝ)
High correlation 
| Distinct | 51 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 19.983771 |
| Minimum | 2 |
|---|---|
| Maximum | 59 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.6 MiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 13 |
| median | 21 |
| Q3 | 27 |
| 95-th percentile | 33 |
| Maximum | 59 |
| Range | 57 |
| Interquartile range (IQR) | 14 |
Descriptive statistics
| Standard deviation | 8.4428088 |
|---|---|
| Coefficient of variation (CV) | 0.42248326 |
| Kurtosis | -0.60330558 |
| Mean | 19.983771 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | -0.0059528072 |
| Sum | 38251256 |
| Variance | 71.281021 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 25 | 145112 | 7.6% |
| 28 | 124730 | 6.5% |
| 11 | 91716 | 4.8% |
| 26 | 90472 | 4.7% |
| 19 | 89712 | 4.7% |
| 27 | 88575 | 4.6% |
| 15 | 81795 | 4.3% |
| 21 | 69632 | 3.6% |
| 22 | 69053 | 3.6% |
| 24 | 65169 | 3.4% |
| Other values (41) | 998150 |
| Value | Count | Frequency (%) |
| 2 | 7108 | 0.4% |
| 3 | 11071 | 0.6% |
| 4 | 22477 | 1.2% |
| 5 | 29615 | 1.5% |
| 6 | 40845 | |
| 7 | 38022 | |
| 8 | 44943 | |
| 9 | 49056 | |
| 10 | 59370 | |
| 11 | 91716 |
| Value | Count | Frequency (%) |
| 59 | 308 | < 0.1% |
| 54 | 299 | < 0.1% |
| 53 | 90 | < 0.1% |
| 51 | 80 | < 0.1% |
| 50 | 681 | < 0.1% |
| 49 | 36 | < 0.1% |
| 46 | 198 | < 0.1% |
| 45 | 160 | < 0.1% |
| 44 | 1662 | 0.1% |
| 43 | 7516 |
station_progress
Real number (ℝ)
High correlation 
| Distinct | 699 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.56738401 |
| Minimum | 0.026315789 |
|---|---|
| Maximum | 1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.6 MiB |
Quantile statistics
| Minimum | 0.026315789 |
|---|---|
| 5-th percentile | 0.13333333 |
| Q1 | 0.33333333 |
| median | 0.57142857 |
| Q3 | 0.8 |
| 95-th percentile | 1 |
| Maximum | 1 |
| Range | 0.97368421 |
| Interquartile range (IQR) | 0.46666667 |
Descriptive statistics
| Standard deviation | 0.27158674 |
|---|---|
| Coefficient of variation (CV) | 0.47866478 |
| Kurtosis | -1.1422673 |
| Mean | 0.56738401 |
| Median Absolute Deviation (MAD) | 0.22857143 |
| Skewness | -0.073130352 |
| Sum | 1086038.8 |
| Variance | 0.073759357 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 129753 | 6.8% |
| 0.5 | 68250 | 3.6% |
| 0.6666666667 | 49182 | 2.6% |
| 0.3333333333 | 41100 | 2.1% |
| 0.75 | 35635 | 1.9% |
| 0.8 | 32477 | 1.7% |
| 0.6 | 31975 | 1.7% |
| 0.4 | 30340 | 1.6% |
| 0.25 | 26608 | 1.4% |
| 0.2 | 23084 | 1.2% |
| Other values (689) | 1445712 |
| Value | Count | Frequency (%) |
| 0.02631578947 | 101 | < 0.1% |
| 0.02702702703 | 26 | < 0.1% |
| 0.02941176471 | 1 | < 0.1% |
| 0.0303030303 | 367 | < 0.1% |
| 0.03125 | 60 | < 0.1% |
| 0.03225806452 | 4 | < 0.1% |
| 0.03333333333 | 16 | < 0.1% |
| 0.03448275862 | 61 | < 0.1% |
| 0.03571428571 | 1806 | |
| 0.03703703704 | 1436 |
| Value | Count | Frequency (%) |
| 1 | 129753 | |
| 0.9814814815 | 7 | < 0.1% |
| 0.98 | 47 | < 0.1% |
| 0.9782608696 | 5 | < 0.1% |
| 0.9777777778 | 3 | < 0.1% |
| 0.9772727273 | 31 | < 0.1% |
| 0.976744186 | 180 | < 0.1% |
| 0.9761904762 | 12 | < 0.1% |
| 0.9756097561 | 8 | < 0.1% |
| 0.975 | 12 | < 0.1% |
info_label_encoded
Categorical
High correlation 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.6 MiB |
| 0 | |
|---|---|
| 1 | |
| 2 | |
| 3 | 121627 |
| 4 | 6422 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 1379619 | |
| 1 | 266395 | 13.9% |
| 2 | 140053 | 7.3% |
| 3 | 121627 | 6.4% |
| 4 | 6422 | 0.3% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 0 | 1379619 | |
| 1 | 266395 | 13.9% |
| 2 | 140053 | 7.3% |
| 3 | 121627 | 6.4% |
| 4 | 6422 | 0.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 1379619 | |
| 1 | 266395 | 13.9% |
| 2 | 140053 | 7.3% |
| 3 | 121627 | 6.4% |
| 4 | 6422 | 0.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1914116 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 1379619 | |
| 1 | 266395 | 13.9% |
| 2 | 140053 | 7.3% |
| 3 | 121627 | 6.4% |
| 4 | 6422 | 0.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1914116 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 1379619 | |
| 1 | 266395 | 13.9% |
| 2 | 140053 | 7.3% |
| 3 | 121627 | 6.4% |
| 4 | 6422 | 0.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1914116 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 1379619 | |
| 1 | 266395 | 13.9% |
| 2 | 140053 | 7.3% |
| 3 | 121627 | 6.4% |
| 4 | 6422 | 0.3% |
arrival_normalized
Real number (ℝ)
High correlation 
| Distinct | 10087 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.49137666 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 2 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.064120325 |
| Q1 | 0.24520087 |
| median | 0.49000594 |
| Q3 | 0.71413022 |
| 95-th percentile | 0.95012864 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.46892935 |
Descriptive statistics
| Standard deviation | 0.28114558 |
|---|---|
| Coefficient of variation (CV) | 0.57215899 |
| Kurtosis | -1.1533586 |
| Mean | 0.49137666 |
| Median Absolute Deviation (MAD) | 0.23965961 |
| Skewness | 0.076360953 |
| Sum | 940551.93 |
| Variance | 0.079042835 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.4713041757 | 372 | < 0.1% |
| 0.5333465268 | 318 | < 0.1% |
| 0.5303779933 | 317 | < 0.1% |
| 0.1029091629 | 317 | < 0.1% |
| 0.3366317039 | 316 | < 0.1% |
| 0.6728676034 | 316 | < 0.1% |
| 0.245398773 | 316 | < 0.1% |
| 0.09697209578 | 314 | < 0.1% |
| 0.381951316 | 314 | < 0.1% |
| 0.3878883831 | 314 | < 0.1% |
| Other values (10077) | 1910902 |
| Value | Count | Frequency (%) |
| 0 | 2 | < 0.1% |
| 0.0004947555907 | 2 | < 0.1% |
| 0.001286364536 | 1 | < 0.1% |
| 0.001781120127 | 5 | < 0.1% |
| 0.001880071245 | 2 | < 0.1% |
| 0.001979022363 | 7 | < 0.1% |
| 0.002077973481 | 5 | < 0.1% |
| 0.002176924599 | 53 | |
| 0.002275875717 | 116 | |
| 0.002374826836 | 72 |
| Value | Count | Frequency (%) |
| 1 | 2 | < 0.1% |
| 0.9999010489 | 18 | < 0.1% |
| 0.9998020978 | 41 | |
| 0.9997031466 | 47 | |
| 0.9996041955 | 54 | |
| 0.9995052444 | 62 | |
| 0.9994062933 | 56 | |
| 0.9993073422 | 46 | |
| 0.9992083911 | 85 | |
| 0.9991094399 | 70 |
departure_normalized
Real number (ℝ)
High correlation 
| Distinct | 10091 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.49144986 |
| Minimum | 0 |
|---|---|
| Maximum | 1 |
| Zeros | 2 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 14.6 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.064120325 |
| Q1 | 0.24520087 |
| median | 0.49010489 |
| Q3 | 0.71422917 |
| 95-th percentile | 0.95022759 |
| Maximum | 1 |
| Range | 1 |
| Interquartile range (IQR) | 0.4690283 |
Descriptive statistics
| Standard deviation | 0.28114571 |
|---|---|
| Coefficient of variation (CV) | 0.57207405 |
| Kurtosis | -1.1533573 |
| Mean | 0.49144986 |
| Median Absolute Deviation (MAD) | 0.23965961 |
| Skewness | 0.076359725 |
| Sum | 940692.04 |
| Variance | 0.07904291 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0.4713041757 | 401 | < 0.1% |
| 0.1879081734 | 323 | < 0.1% |
| 0.04541856323 | 322 | < 0.1% |
| 0.6729665545 | 319 | < 0.1% |
| 0.3879873343 | 317 | < 0.1% |
| 0.2454977241 | 315 | < 0.1% |
| 0.04957451019 | 315 | < 0.1% |
| 0.5304769444 | 315 | < 0.1% |
| 0.4725905403 | 314 | < 0.1% |
| 0.4720957847 | 314 | < 0.1% |
| Other values (10081) | 1910861 |
| Value | Count | Frequency (%) |
| 0 | 2 | < 0.1% |
| 0.001286364536 | 1 | < 0.1% |
| 0.001781120127 | 4 | < 0.1% |
| 0.001880071245 | 2 | < 0.1% |
| 0.001979022363 | 7 | < 0.1% |
| 0.002077973481 | 5 | < 0.1% |
| 0.002176924599 | 53 | |
| 0.002275875717 | 111 | |
| 0.002374826836 | 70 | |
| 0.002473777954 | 43 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 8 | < 0.1% |
| 0.9999010489 | 30 | < 0.1% |
| 0.9998020978 | 45 | |
| 0.9997031466 | 61 | |
| 0.9996041955 | 52 | |
| 0.9995052444 | 57 | |
| 0.9994062933 | 47 | |
| 0.9993073422 | 74 | |
| 0.9992083911 | 71 | |
| 0.9991094399 | 89 |
Interactions
Correlations
| IBNR | ID_Base | ID_Timestamp | arrival_delay_m | arrival_normalized | departure_normalized | info_label_encoded | lat | long | max_station_number | prev_arrival_delay_m | prev_departure_delay_m | station_progress | stop_number | transformed_info_message | weighted_avg_prev_delay | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IBNR | 1.000 | -0.001 | -0.001 | -0.152 | -0.001 | -0.001 | 0.051 | 0.273 | 0.523 | 0.166 | -0.117 | -0.118 | -0.038 | 0.086 | 0.051 | -0.107 |
| ID_Base | -0.001 | 1.000 | 0.001 | -0.001 | 0.001 | 0.001 | 0.016 | 0.003 | 0.004 | -0.007 | -0.001 | -0.000 | -0.001 | -0.004 | 0.016 | -0.001 |
| ID_Timestamp | -0.001 | 0.001 | 1.000 | -0.022 | 0.978 | 0.978 | 0.045 | 0.006 | 0.004 | 0.003 | -0.018 | -0.018 | -0.004 | -0.002 | 0.045 | -0.020 |
| arrival_delay_m | -0.152 | -0.001 | -0.022 | 1.000 | -0.024 | -0.024 | 0.013 | -0.257 | -0.106 | 0.129 | 0.610 | 0.626 | 0.156 | 0.223 | 0.013 | 0.585 |
| arrival_normalized | -0.001 | 0.001 | 0.978 | -0.024 | 1.000 | 1.000 | 0.050 | 0.005 | 0.003 | -0.000 | -0.020 | -0.019 | 0.004 | 0.002 | 0.050 | -0.020 |
| departure_normalized | -0.001 | 0.001 | 0.978 | -0.024 | 1.000 | 1.000 | 0.050 | 0.005 | 0.003 | -0.000 | -0.020 | -0.019 | 0.004 | 0.002 | 0.050 | -0.020 |
| info_label_encoded | 0.051 | 0.016 | 0.045 | 0.013 | 0.050 | 0.050 | 1.000 | 0.237 | 0.234 | 0.141 | 0.010 | 0.010 | 0.019 | 0.068 | 1.000 | 0.010 |
| lat | 0.273 | 0.003 | 0.006 | -0.257 | 0.005 | 0.005 | 0.237 | 1.000 | 0.243 | -0.004 | -0.230 | -0.244 | -0.014 | -0.014 | 0.237 | -0.245 |
| long | 0.523 | 0.004 | 0.004 | -0.106 | 0.003 | 0.003 | 0.234 | 0.243 | 1.000 | 0.114 | -0.107 | -0.111 | -0.022 | 0.057 | 0.234 | -0.103 |
| max_station_number | 0.166 | -0.007 | 0.003 | 0.129 | -0.000 | -0.000 | 0.141 | -0.004 | 0.114 | 1.000 | 0.174 | 0.150 | -0.130 | 0.553 | 0.141 | 0.277 |
| prev_arrival_delay_m | -0.117 | -0.001 | -0.018 | 0.610 | -0.020 | -0.020 | 0.010 | -0.230 | -0.107 | 0.174 | 1.000 | 0.830 | 0.167 | 0.270 | 0.010 | 0.748 |
| prev_departure_delay_m | -0.118 | -0.000 | -0.018 | 0.626 | -0.019 | -0.019 | 0.010 | -0.244 | -0.111 | 0.150 | 0.830 | 1.000 | 0.148 | 0.235 | 0.010 | 0.667 |
| station_progress | -0.038 | -0.001 | -0.004 | 0.156 | 0.004 | 0.004 | 0.019 | -0.014 | -0.022 | -0.130 | 0.167 | 0.148 | 1.000 | 0.681 | 0.019 | 0.318 |
| stop_number | 0.086 | -0.004 | -0.002 | 0.223 | 0.002 | 0.002 | 0.068 | -0.014 | 0.057 | 0.553 | 0.270 | 0.235 | 0.681 | 1.000 | 0.068 | 0.474 |
| transformed_info_message | 0.051 | 0.016 | 0.045 | 0.013 | 0.050 | 0.050 | 1.000 | 0.237 | 0.234 | 0.141 | 0.010 | 0.010 | 0.019 | 0.068 | 1.000 | 0.010 |
| weighted_avg_prev_delay | -0.107 | -0.001 | -0.020 | 0.585 | -0.020 | -0.020 | 0.010 | -0.245 | -0.103 | 0.277 | 0.748 | 0.667 | 0.318 | 0.474 | 0.010 | 1.000 |
Missing values
Sample
| ID_Base | ID_Timestamp | stop_number | IBNR | long | lat | arrival_plan | departure_plan | arrival_delay_m | transformed_info_message | prev_arrival_delay_m | prev_departure_delay_m | weighted_avg_prev_delay | max_station_number | station_progress | info_label_encoded | arrival_normalized | departure_normalized | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -1001326572688500578 | 2407082041 | 2 | 8011118.0 | 13.375988 | 52.509379 | 2024-07-08 20:44:00 | 2024-07-08 20:45:00 | 0.0 | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.285714 | 0 | 0.125866 | 0.125965 |
| 1 | -1001326572688500578 | 2407082041 | 3 | 8011160.0 | 9.095851 | 48.849792 | 2024-07-08 20:50:00 | 2024-07-08 20:50:00 | 0.0 | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.428571 | 0 | 0.126460 | 0.126460 |
| 2 | -1001326572688500578 | 2407082041 | 4 | 8011167.0 | 13.299437 | 52.530276 | 2024-07-08 20:55:00 | 2024-07-08 20:56:00 | 0.0 | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.571429 | 0 | 0.126954 | 0.127053 |
| 3 | -1001326572688500578 | 2407082041 | 5 | 8010404.0 | 13.196898 | 52.534648 | 2024-07-08 21:00:00 | 2024-07-08 21:03:00 | 2.0 | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.714286 | 0 | 0.127449 | 0.127746 |
| 4 | -1001326572688500578 | 2407082041 | 6 | 8080040.0 | 13.128917 | 52.549396 | 2024-07-08 21:06:00 | 2024-07-08 21:07:00 | 1.0 | No message | 2.0 | 0.0 | 0.666667 | 7 | 0.857143 | 0 | 0.128043 | 0.128142 |
| 5 | -1001326572688500578 | 2407082041 | 7 | 8081586.0 | 13.116810 | 52.552480 | 2024-07-08 21:08:00 | 2024-07-08 21:09:00 | 6.0 | No message | 1.0 | 1.0 | 0.761905 | 7 | 1.000000 | 0 | 0.128241 | 0.128340 |
| 6 | -1001326572688500578 | 2407092041 | 2 | 8011118.0 | 13.375988 | 52.509379 | 2024-07-09 20:44:00 | 2024-07-09 20:45:00 | 0.0 | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.285714 | 0 | 0.268355 | 0.268454 |
| 7 | -1001326572688500578 | 2407092041 | 3 | 8011160.0 | 8.309970 | 54.920783 | 2024-07-09 20:50:00 | 2024-07-09 20:50:00 | 0.0 | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.428571 | 0 | 0.268949 | 0.268949 |
| 8 | -1001326572688500578 | 2407092041 | 4 | 8011167.0 | 13.299437 | 52.530276 | 2024-07-09 20:55:00 | 2024-07-09 20:56:00 | 0.0 | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.571429 | 0 | 0.269444 | 0.269543 |
| 9 | -1001326572688500578 | 2407092041 | 5 | 8010404.0 | 13.196898 | 52.534648 | 2024-07-09 21:00:00 | 2024-07-09 21:03:00 | 4.0 | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.714286 | 0 | 0.269939 | 0.270236 |
| ID_Base | ID_Timestamp | stop_number | IBNR | long | lat | arrival_plan | departure_plan | arrival_delay_m | transformed_info_message | prev_arrival_delay_m | prev_departure_delay_m | weighted_avg_prev_delay | max_station_number | station_progress | info_label_encoded | arrival_normalized | departure_normalized | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1914106 | 999976718847540977 | 2407090447 | 6 | 8005649.0 | 7.110814 | 49.274763 | 2024-07-09 05:01:00 | 2024-07-09 05:02:00 | 1.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 1.000000 | 0 | 0.175045 | 0.175143 |
| 1914107 | 999976718847540977 | 2407100447 | 2 | 8005241.0 | 7.018788 | 49.230425 | 2024-07-10 04:50:00 | 2024-07-10 04:51:00 | 0.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 0.333333 | 0 | 0.316446 | 0.316545 |
| 1914108 | 999976718847540977 | 2407100447 | 3 | 8005306.0 | 7.199622 | 51.177270 | 2024-07-10 04:50:00 | 2024-07-10 04:50:00 | 0.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 0.500000 | 0 | 0.316446 | 0.316446 |
| 1914109 | 999976718847540977 | 2407100447 | 4 | 8005332.0 | 7.057083 | 49.244018 | 2024-07-10 04:55:00 | 2024-07-10 04:56:00 | 0.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 0.666667 | 0 | 0.316940 | 0.317039 |
| 1914110 | 999976718847540977 | 2407100447 | 5 | 8005044.0 | 7.004241 | 51.160909 | 2024-07-10 04:56:00 | 2024-07-10 04:56:00 | 0.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 0.833333 | 0 | 0.317039 | 0.317039 |
| 1914111 | 999976718847540977 | 2407100447 | 6 | 8005649.0 | 7.110814 | 49.274763 | 2024-07-10 05:01:00 | 2024-07-10 05:02:00 | 1.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 1.000000 | 0 | 0.317534 | 0.317633 |
| 1914112 | 999976718847540977 | 2407120447 | 2 | 8005241.0 | 7.018788 | 49.230425 | 2024-07-12 04:50:00 | 2024-07-12 04:51:00 | 0.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 0.333333 | 0 | 0.601425 | 0.601524 |
| 1914113 | 999976718847540977 | 2407120447 | 3 | 8005306.0 | 8.243728 | 50.070788 | 2024-07-12 04:50:00 | 2024-07-12 04:50:00 | 0.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 0.500000 | 0 | 0.601425 | 0.601425 |
| 1914114 | 999976718847540977 | 2407120447 | 4 | 8005332.0 | 7.057083 | 49.244018 | 2024-07-12 04:55:00 | 2024-07-12 04:56:00 | 0.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 0.666667 | 0 | 0.601920 | 0.602019 |
| 1914115 | 999976718847540977 | 2407120447 | 6 | 8005649.0 | 7.110814 | 49.274763 | 2024-07-12 05:01:00 | 2024-07-12 05:02:00 | 5.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 1.000000 | 0 | 0.602513 | 0.602612 |